Clustering Algorithm for Text Classification Using Fuzzy Logic

نویسنده

  • Prakash Rao
چکیده

Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. We propose a fuzzy similarity-based self-constructing algorithm for feature clustering. The words in the feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical mean and deviation. When all the words have been fed in, a desired number of clusters are formed automatically. We then have one extracted feature for each cluster. The extracted feature, corresponding to a cluster, is a weighted combination of the words contained in the cluster. By this algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Besides, the user need not specify the number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted features can then be avoided. Experimental results show that our method can run faster and obtain better extracted features than other methods. Indexing terms: fuzzy, training data, cluster, membership function

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach to Classify Text based on CosFuzzy Logic

Objective type of Examination evaluation is easy in Computer world. But the descriptive type of question evaluation is more difficult and there is no significant research has been taken place. In this paper I propose a new solution to the above problem with text classification using the new fuzzy logic named CosFuzzy Logic. Document Clustering is a useful technique that organizes a large quanti...

متن کامل

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

A Fuzzy Logic Based on Sentiment Classification

Sentiment classification aims to detect information such as opinions, explicit , implicit feelings expressed in text. The most existing approaches are able to detect either explicit expressions or implicit expressions of sentiments in the text separately. In this proposed framework it will detect both Implicit and Explicit expressions available in the meeting transcripts. It will classify the P...

متن کامل

A Novel Fuzzy based Clustering Algorithm for Text Classification

Due to the flourish of World Wide Web and the rapid development of the Internet technology, the increasing volume of digital textual data become more and more unmanageable, therefore the importance of text classification has gained significant attention. Text classification pose some specific challenges such as high dimensionality with each document (data point) having only a very small subset ...

متن کامل

استفاده از الگوریتم خوشه‌بندی فازی در تعیین میزان رسوبات بارمعلق روزانه (مطالعه موردی: حوزه آبخیز کسیلیان)

In many water resource projects such as dams, flood control, navigability, river aesthetics, environmental issues and the estimation of suspended load have great importance. The complexity of sediment behavior and mathematical and physical model inability in simulation of sedimentation processes have led to the development of new technologies such as fuzzy logic which has the ability to identif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012